Descriptive Analysis & Preliminary Results

Introduction

The following document summarises the progress made thus far on Chapter 1: Local Fiscal Risks of Decarbonisation of my DPhil. The work aims to pursue a better understanding of how industrial transformation impacts local well-being. From an original interest in looking at all aspects of local public finance, the project has narrowed to focus on expenditure on public education and its connection to industrial prosperity and transformation.

Below, I provide some descriptive statistics and figures about the data, some preliminary regression tests of the significance of various GDP control variables, and some regression results using a standard TWFE model (with and without state-level trends) and coal mining activity as instruments for property taxes and GDP.

The main research question is: How does industrial transformation/activity impact county-level expenditure on public education?

Conclusion: I do not yet feel confident in any of the results below but hopefully these steps can help orient the next phase of analysis (ideally moving on to consider oil and gas industry treatment variables as well or treatments that proxy “industrial transformation” in a more general sense).

Data

All data used is reported annually at the county-level. Therefore, no time-invariant variables are included (apart from the State in which a county is in, which is made time-variant through the inclusion of a state-level trend in various models).

Expenditure and Revenue: The dependent variables of interest come from Willamette University’s Government Finance Database. The data includes county-level revenue and expenditure on public education including disaggregated values by revenue source (federal, state, or other intergovernmental revenue) and expenditure item (lunches, wages, debt). All values are reported in current US dollars. The data for property taxes collected used in regressions below also come from this dataset. Expenditure on vocational training and from Educational Service Agencies (ESAs) are also sourced from this dataset. The challenge of accounting for ESAs are shown in KR2 section below.

GDP Controls: US Bureau of Economic Analysis. Values are also reported in current US dollars (real GDP values exist). The controls used in the below are total, private industry, and oil, gas, mining & quarrying county-level GDP.

Population controls: US Census Bureau.

Coal mine activity and production levels: Mine Safety and Health Administration

Descriptive Statistics

The below summary table displays descriptive statistics for all indicators used in regression results below.

All values for total enrollment, revenue, taxes, expenditure are in thousand USD. Values for GDP are reported in million USD. Values for population are reported in thousands.

Statistic N Mean St. Dev. Min Max
esa_tot_exp 7,805 20 36 0 463
esa_tot_exp_pp 7,805 2 2 0 37
Enrollment 60,280 15 50 0 1,724
Total_Revenue 60,280 165 602 0 25,052
Total_Taxes 60,280 64 251 0 7,909
Property_Tax 60,280 61 247 0 7,909
Total_IG_Revenue 60,280 93 363 0 18,363
Total_Fed_IG_Revenue 60,280 1 4 0 137
Total_Expenditure 60,280 169 616 0 23,008
Total_Interest_on_Debt 60,280 5 24 0 1,006
Total_Revenue_pp 60,280 12 6 2 191
Total_Taxes_pp 60,280 4 5 0 165
Property_Tax_pp 60,280 4 5 0 165
Total_IG_Revenue_pp 60,280 7 3 0 116
Total_Fed_IG_Revenue_pp 60,280 0 1 0 13
Total_Expenditure_pp 60,280 12 6 2 186
Total_Interest_on_Debt_pp 60,280 0 0 0 10
pop_total 60,280 97 324 0 10,106
pop_school_age 60,280 20 66 0 2,219
gdp_total 57,540 4,847 20,834 4 836,163
gdp_priv_ind 57,540 4,259 18,804 3 752,486
gdp_govt 57,540 587 2,187 1 83,677
gdp_o_g_mining_quarr_21 57,540 94 569 0 27,694

Descriptive Figures: County-level education expenditure

The following plot displays each county’s time series of expenditure (total and per pupil log values) as well as the overall mean. Colors correspond to the US state the county is in.

Key Relationships (KR)

Below are scatterplots and diagrams depicting key relationships between dependent and control variables as well as shares/components of key variables.

KR 1: Revenue Sources

Most important to tease out before modelling is how different revenue sources (federal, state, county (own), and other local sources) interplay. County-level revenue for public education is a combination of both local and intergovernmental sources. The local portion of the share is almost entirely sourced through property taxes. The intergovernmental sources come from state, federal, and local aid.

Below chart plots county-level mean (taken over the panel time horizon) of different intergovernmental (IG) sources versus own-source revenue (generated from local sources). The solid black line represents a best-fit line and the dashed line represents a 45 degree line. The blue plot shows Total IG Revenue (Federal + State + Other Local) versus own revenue. There is a near negative correlation between the two (ie. they are near-substitutes for one another). This effect is dominated by State IG revenue (as can be seen in the purple panel).

The following plots show the share of revenue for public education that comes from own sources, local intergovernmental, state intergovernmental, and federal intergovernmental by state. Almost all states have a near 50% split of revenue from state and own sources, which aligns with data from the Congressional Research Service cited in the Transfer of Status report. Massachusetts has an unusually high share of local IG support (inter-school aid), eclipsing own sources almost completely. From further research, I believe this has to do with a unique structure of Massachusetts public school funding which is reliant on several multi-county funding agencies (similar to the mentioned ESAs). This anomaly might warrant the exclusion of MA from the analysis.

Conclusion: I am still unsure how best to control for the effects of these shares on our results.

The below plot provides the same information as above but on a national level.

Conclusion: Corroborates the near-even split of revenue between state and own sources which seems to be a fact of public education revenue. I mainly include this as an alternative summarising figure to the above.

KR 2: ESAs and County Expenditure

Around 2007, many states instituted Educational Service Agencies (ESAs) which sought to “equalise” public education across the country. To date, ~45 states have ESAs which are responsible for multiple school districts, most often across multiple counties. Therefore, when modelling county-level expenditure it will be important to understand how this change in educational expenditure affected county-level spending (ie. did ESAs replace or supplement county-level funding).

Only 593 counties of 2740 in the dataset have recorded revenue/expenditure from ESAs. After some digging, I believe that these values for ESAs are improperly recorded in the sense that the revenue is recorded only in counties in which the ESA’s headquarters is located and not partitioned to the counties to which the revenue ultimately flows. All county-level total expenditure/revenue values that are used in the regressions on this page have explicitly excluded recorded ESA values for this reason (they are instead recorded in a variable called esa_tot_exp or esa_tot_rev in total and per pupil values).

The below graphs show some relationships between ESA and county-level finances. I have yet to arrive at a definitive understanding of how ESAs and county-level finances interact. They do not appear to be substitutes. As is evident, the data on ESA expenditure is patchy and highly volatile by county. At the moment, I believe this is because of imperfect/inconsistent reporting in comparison to traditional school district reporting. The four states that have recorded ESA expenditure before 2007 are California, Illinois, Minnesota, Oregon.

With this we want to demonstrate how ESAs interact with our expenditure indicator. Does high ESA spending imply low/high local spending? It seems from the below that the two have a no correlation, implying little substitution? (Each point is a county in a particular year starting from 2007, colour represents state). Warm color-scale is (top-left panel) is expenditure values whereas the cool color scales are revenue values. The black line is a 45 degree line.

KR 3: Revenue, Expenditure and Property Taxes

We know the majority of “own source” revenue comes from property taxes. The below scatterplots demonstrate the relationship between education revenue and expeniture versus property taxes collected. The solid black line represents a best-fit line and the dashed line represents a 45 degree line (intercept-adjusted to match data).

KR 4: GDP and Property Taxes

The solid black line represents a best-fit line and the dashed line represents a 45 degree line (intercept-adjusted to match data).

GDP and Property taxes have a positive linear correlation, to be expected.

KR 5: GDP and Education Expenditure

The solid black line represents a best-fit line and the dashed line represents a 45 degree line (intercept-adjusted to match data).

GDP and Education expenditure have a positive linear correlation, as expected.

Preliminary Results

Panel testing

is.pbalanced(df, index = c("fips", "year"))
[1] TRUE

Descriptive regression models

The below models were run to test the relevance of various GDP control variables (total, private industry, oil, gas & mining) as well as corroborate some relationships in the above scatter plots of KRs. Ideally, once variables have been surveyed, they will be put through model selection via getspanel.

All regression models that follow include TWFE (county- and year- fixed effects) and standard errors clustered by county.

All functional forms in the feols() functions below are of the form “Y ~ X” In the cases in which multiple estimations are included via sw(Xa, Xb, Xc + Xd), the function will return results for “Y~Xa”, “Y~Xb”, “Y ~ Xc + Xd”.

To ease reading, I have ensured that no results table contains more than 4 models (to limit need to scroll horizontally).

Property Tax ~ GDP

GDP has a highly relevant relationship to property taxes. A 1% increase in GDP and GDP per capita leads to a 0.3% increase in property taxes collected.

list(feols(log_Property_Tax ~ log_gdp_total | fips + year, data = mines_df, se = "cluster", cluster = "fips"),
     feols(log_Property_Tax_pp ~ log_gdp_total_pc | fips + year, data = mines_df, se = "cluster", cluster = "fips")) %>% etable(se.below = TRUE) %>% kable
model 1 model 2
Dependent Var.: (log) Property Taxes (log) Property Taxes pp
(log) GDP 0.3430***
(0.0311)
(log) GDP pc 0.3020***
(0.0267)
Fixed-Effects: -------------------- -----------------------
fips Yes Yes
year Yes Yes
_______________ ____________________ _______________________
S.E.: Clustered by: fips by: fips
Observations 57,540 57,540
R2 0.95236 0.90885
Within R2 0.02007 0.03622

Education Expenditure ~ Revenue Sources

The below regressions are included to establish the relationship between education expenditure and its component parts. These regressions simply corroborate what is displayed in the section on Key Relationships above (ie. that the largest form of IG revenue is state funding and “Own Source” revenue is largely sourced from Property Taxes).

feols(log_Total_Educ_Current_Exp ~ sw(log_Property_Tax, 
                                       log_Property_Tax + log_Total_IG_Revenue,
                                       log_Property_Tax + log_Total_Fed_IG_Revenue + log_State_IGR_Education,
                                      log_Total_Rev_Own_Sources + log_Total_IG_Revenue) | fips + year, data = mines_df, se = "cluster", cluster = "fips") %>% etable(se.below = TRUE) %>% kable
..1 ..2 ..3 ..4
Dependent Var.: (log) Edu Exp (log) Edu Exp (log) Edu Exp (log) Edu Exp
(log) Property Taxes 0.0444*** 0.0572*** 0.0536***
(0.0068) (0.0077) (0.0078)
(log) IG Revenue 0.3409*** 0.3448***
(0.0140) (0.0129)
(log) Federal IG Rev. 0.0032***
(0.0006)
(log) State IG Rev 0.3108***
(0.0151)
(log) Rev. Own Sources 0.2072***
(0.0089)
Fixed-Effects: ------------- ------------- ------------- -------------
fips Yes Yes Yes Yes
year Yes Yes Yes Yes
______________________ _____________ _____________ _____________ _____________
S.E.: Clustered by: fips by: fips by: fips by: fips
Observations 57,540 57,540 57,540 57,538
R2 0.99497 0.99668 0.99653 0.99727
Within R2 0.03080 0.36064 0.33177 0.47345
feols(log_Total_Educ_Current_Exp_pp ~ sw(log_Total_Rev_Own_Sources_pp + log_Total_IG_Revenue_pp,
                                         log_Property_Tax_pp, 
                                         log_Property_Tax_pp + log_Total_IG_Revenue_pp, 
                                         log_Property_Tax_pp + log_Total_Fed_IG_Revenue_pp + log_State_IGR_Education_pp) | fips + year, data = mines_df, se = "cluster", cluster = "fips") %>% etable(se.below = TRUE) %>% kable
..1 ..2 ..3 ..4
Dependent Var.: (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp
(log) Rev. Own Sources pp 0.2045***
(0.0096)
(log) IG Revenue pp 0.2861*** 0.2862***
(0.0135) (0.0148)
(log) Property Taxes pp 0.1190*** 0.1399*** 0.1403***
(0.0140) (0.0144) (0.0142)
(log) Federal IG Rev. pp 0.0018***
(0.0004)
(log) State IG Rev pp 0.2629***
(0.0149)
Fixed-Effects: ---------------- ---------------- ---------------- ----------------
fips Yes Yes Yes Yes
year Yes Yes Yes Yes
_________________________ ________________ ________________ ________________ ________________
S.E.: Clustered by: fips by: fips by: fips by: fips
Observations 57,538 57,540 57,540 57,540
R2 0.94132 0.91035 0.93477 0.93269
Within R2 0.41466 0.10573 0.34929 0.32852

Education Expenditure ~ GDP

A 1% increase in GDP is associated with a 0.14% increase in education expenditure, dominated by the effect of GDP from private industry (0.12%). I include here also the GDP generated from the oil, gas, mining, and quarrying sector. The effect is statistically significant but small.

feols(log_Total_Educ_Current_Exp ~ sw(log_gdp_total,
                                      log_gdp_priv_ind,
                                      log_gdp_o_g_mining_quarr_21) | fips + year, data = mines_df, se = "cluster", cluster = "fips") %>% etable(se.below = TRUE) %>% kable
..1 ..2 ..3
Dependent Var.: (log) Edu Exp (log) Edu Exp (log) Edu Exp
(log) GDP 0.1456***
(0.0099)
(log) GDP Priv. Industry 0.1152***
(0.0083)
(log) GDP O&G&Mining 0.0037***
(0.0007)
Fixed-Effects: ------------- ------------- -------------
fips Yes Yes Yes
year Yes Yes Yes
________________________ _____________ _____________ _____________
S.E.: Clustered by: fips by: fips by: fips
Observations 57,540 57,540 57,540
R2 0.99510 0.99504 0.99482
Within R2 0.05650 0.04374 0.00279
feols(log_Total_Educ_Current_Exp_pp ~ sw(log_gdp_total_pc,
                                         log_gdp_priv_ind_pc,
                                         log_gdp_o_g_mining_quarr_21_pc) | fips + year, data = mines_df, se = "cluster", cluster = "fips") %>% etable(se.below = TRUE) %>% kable
..1 ..2 ..3
Dependent Var.: (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp
(log) GDP pc 0.0812***
(0.0083)
(log) GDP Priv. Industry pc 0.0674***
(0.0072)
(log) O&G&Mining pc 0.0177***
(0.0035)
Fixed-Effects: ---------------- ---------------- ----------------
fips Yes Yes Yes
year Yes Yes Yes
___________________________ ________________ ________________ ________________
S.E.: Clustered by: fips by: fips by: fips
Observations 57,540 57,540 57,540
R2 0.90172 0.90143 0.90008
Within R2 0.01956 0.01670 0.00325

Incorporating time lags

Education expenditure has a highly relevant time dependence. The effect of increases in GDP two years prior has the greatest effect on current education expenditure, implying a delayed effect of county-level economic prosperity/decline on public education expenditure.

Question: Is the above interpretation of the time dependence accurate?

levs_gdp_real_trend <- feols(log_Total_Educ_Current_Exp ~ sw(log_gdp_total + log_gdp_total_l1 + log_gdp_total_l2, log_gdp_priv_ind + log_gdp_priv_ind_l1 + log_gdp_priv_ind_l2) + time:as.factor(state) | fips + year, data = mines_df, se = "cluster", cluster = "fips")
levs_gdp_real_trend %>% etable(., drop = "time*", se.below = TRUE) %>% kable
..1 ..2
Dependent Var.: (log) Edu Exp (log) Edu Exp
(log) GDP 0.0347***
(0.0064)
(log,l1) GDP 0.0238***
(0.0043)
(log,l2) GDP 0.0806***
(0.0065)
(log) GDP Priv. Industry 0.0260***
(0.0054)
(log,l1) GDP Priv. Industry 0.0197***
(0.0036)
(log,l2) GDP Priv. Industry 0.0649***
(0.0057)
Fixed-Effects: ------------- -------------
fips Yes Yes
year Yes Yes
___________________________ _____________ _____________
S.E.: Clustered by: fips by: fips
Observations 52,060 52,060
R2 0.99641 0.99637
Within R2 0.21874 0.20940
feols(log_Total_Educ_Current_Exp ~ sw(log_gdp_total_pc + log_gdp_total_pc_l1 + log_gdp_total_pc_l2, 
                                      log_gdp_priv_ind_pc + log_gdp_priv_ind_pc_l1 + log_gdp_priv_ind_pc_l2,
                                      log_gdp_o_g_mining_quarr_21 + log_gdp_o_g_mining_quarr_21_l1 + log_gdp_o_g_mining_quarr_21_l2) + time:as.factor(state) | fips + year, data = mines_df, se = "cluster", cluster = "fips") %>%  etable(., drop = "time*", se.below = TRUE) %>% kable
..1 ..2 ..3
Dependent Var.: (log) Edu Exp (log) Edu Exp (log) Edu Exp
(log) GDP pc -0.0044
(0.0065)
(log,l1) GDP pc 0.0111*
(0.0046)
(log,l2) GDP pc 0.0323***
(0.0068)
(log) GDP Priv. Industry pc -0.0059
(0.0056)
(log,l1) GDP Priv. Industry pc 0.0086*
(0.0039)
(log,l2) GDP Priv. Industry pc 0.0267***
(0.0060)
(log) GDP O&G&Mining 0.0006
(0.0005)
(log,l1) GDP O&G&Mining 0.0015***
(0.0004)
(log,l2) GDP O&G&Mining 0.0022***
(0.0005)
Fixed-Effects: ------------- ------------- -------------
fips Yes Yes Yes
year Yes Yes Yes
______________________________ _____________ _____________ _____________
S.E.: Clustered by: fips by: fips by: fips
Observations 52,060 52,060 52,060
R2 0.99623 0.99623 0.99623
Within R2 0.17929 0.17842 0.17860
feols(log_Total_Educ_Current_Exp_pp ~ sw(log_gdp_total_pc + log_gdp_total_pc_l1 + log_gdp_total_pc_l2, 
                                      log_gdp_priv_ind_pc + log_gdp_priv_ind_pc_l1 + log_gdp_priv_ind_pc_l2,
                                      log_gdp_o_g_mining_quarr_21_pc + log_gdp_o_g_mining_quarr_21_pc_l1 + log_gdp_o_g_mining_quarr_21_pc_l2) + time:as.factor(state) | fips + year, data = mines_df, se = "cluster", cluster = "fips") %>%  etable(., drop = "time*", se.below = TRUE) %>% kable
..1 ..2 ..3
Dependent Var.: (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp
(log) GDP pc 0.0051
(0.0068)
(log,l1) GDP pc 0.0270**
(0.0084)
(log,l2) GDP pc 0.0387***
(0.0058)
(log) GDP Priv. Industry pc 0.0030
(0.0059)
(log,l1) GDP Priv. Industry pc 0.0223**
(0.0070)
(log,l2) GDP Priv. Industry pc 0.0333***
(0.0052)
(log) O&G&Mining pc -0.0002
(0.0037)
(log,l1) O&G&Mining pc -0.0011
(0.0044)
(log,l2) O&G&Mining pc 0.0194***
(0.0043)
Fixed-Effects: ---------------- ---------------- ----------------
fips Yes Yes Yes
year Yes Yes Yes
______________________________ ________________ ________________ ________________
S.E.: Clustered by: fips by: fips by: fips
Observations 52,060 52,060 52,060
R2 0.91310 0.91293 0.91218
Within R2 0.17190 0.17027 0.16307

Coal Mine & Production Data

Coal mine data exists from 1996-2021. The below is a preliminary treatment attempt using data on active coal production (total_active_prod) and active coal mines (total_active_n).

Question: might it be more interesting to look at the first difference of mining and production activity to understand the effect of a change in activity?

Standard TWFE panel regression

list(feols(log_Total_Educ_Current_Exp_pp ~ log_gdp_priv_ind_pc + log_gdp_priv_ind_pc_l1 + log_gdp_priv_ind_pc_l2 + diff_total_active_n + l1_diff_total_active_n + l2_diff_total_active_n | fips + year, data = mines_df, se = "cluster", cluster = "fips"),
feols(log_Total_Educ_Current_Exp_pp ~ sw(log_gdp_priv_ind_pc + log_gdp_priv_ind_pc_l1 + log_gdp_priv_ind_pc_l2 + diff_total_active_prod + l1_diff_total_active_prod + l2_diff_total_active_prod,
                                         log_gdp_total_pc + log_gdp_total_pc_l1 + log_gdp_total_pc_l2 + diff_total_active_n + l1_diff_total_active_n + l2_diff_total_active_n,
                                         log_gdp_total_pc + log_gdp_total_pc_l1 + log_gdp_total_pc_l2 + diff_total_active_prod + l1_diff_total_active_prod + l2_diff_total_active_prod) | fips + year, data = mines_df, se = "cluster", cluster = "fips")) %>% etable(se.below = TRUE) %>% kable
model 1 .1 .2 .3
Dependent Var.: (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp
(log) GDP Priv. Industry pc 0.0099 0.0096
(0.0064) (0.0064)
(log,l1) GDP Priv. Industry pc 0.0272*** 0.0273***
(0.0069) (0.0069)
(log,l2) GDP Priv. Industry pc 0.0485*** 0.0487***
(0.0056) (0.0056)
(fd) Active Coal Mines -0.0016* -0.0016*
(0.0008) (0.0008)
(l1, fd) Active Coal Mines -0.0016* -0.0016**
(0.0006) (0.0006)
(l2, fd) Active Coal Mines -0.0012* -0.0012*
(0.0006) (0.0006)
(fd) Active Coal Prod -2.1e-5 -1.97e-5
(0.0004) (0.0004)
(l1,fd) Active Coal Prod -0.0003 -0.0003
(0.0003) (0.0003)
(l2,fd) Active Coal Prod -0.0014*** -0.0014***
(0.0002) (0.0002)
(log) GDP pc 0.0138 0.0134
(0.0075) (0.0075)
(log,l1) GDP pc 0.0320*** 0.0321***
(0.0084) (0.0084)
(log,l2) GDP pc 0.0561*** 0.0563***
(0.0063) (0.0063)
Fixed-Effects: ---------------- ---------------- ---------------- ----------------
fips Yes Yes Yes Yes
year Yes Yes Yes Yes
______________________________ ________________ ________________ ________________ ________________
S.E.: Clustered by: fips by: fips by: fips by: fips
Observations 52,060 52,060 52,060 52,060
R2 0.89752 0.89751 0.89786 0.89786
Within R2 0.02339 0.02334 0.02668 0.02663
list(feols(log_Total_Educ_Current_Exp_pp ~ log_gdp_priv_ind_pc + log_gdp_priv_ind_pc_l1 + log_gdp_priv_ind_pc_l2 + l(total_active_n, 0:2)  | fips + year, data = mines_df, se = "cluster", cluster = "fips", panel.id = ~fips+year),
     feols(log_Total_Educ_Current_Exp_pp ~ sw(log_gdp_priv_ind_pc + log_gdp_priv_ind_pc_l1 + log_gdp_priv_ind_pc_l2 + l(total_active_prod, 0:2),
                                         log_gdp_total_pc + log_gdp_total_pc_l1 + log_gdp_total_pc_l2 + l(total_active_n, 0:2),
                                         log_gdp_total_pc + log_gdp_total_pc_l1 + log_gdp_total_pc_l2 + l(total_active_prod, 0:2)) | fips + year, data = mines_df, se = "cluster", cluster = "fips", panel.id = ~fips+year)) %>% etable(se.below = TRUE) %>% kable
model 1 .1 .2 .3
Dependent Var.: (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp (log) Edu Exp pp
(log) GDP Priv. Industry pc 0.0099 0.0095
(0.0065) (0.0064)
(log,l1) GDP Priv. Industry pc 0.0271*** 0.0272***
(0.0070) (0.0070)
(log,l2) GDP Priv. Industry pc 0.0486*** 0.0487***
(0.0056) (0.0056)
Active Coal Mines -0.0016 -0.0017
(0.0009) (0.0009)
(l1) Active Coal Mines 0.0001 0.0001
(0.0004) (0.0004)
(l2) Active Coal Mines 0.0012* 0.0012*
(0.0005) (0.0005)
Active Coal Prod 9.94e-8 1.01e-7
(4.16e-7) (4.13e-7)
(l1) Active Coal Prod -3.84e-7 -3.96e-7
(2.46e-7) (2.48e-7)
(l2) Active Coal Prod 4.34e-7 3.84e-7
(5.84e-7) (5.88e-7)
(log) GDP pc 0.0138 0.0133
(0.0075) (0.0075)
(log,l1) GDP pc 0.0320*** 0.0321***
(0.0084) (0.0084)
(log,l2) GDP pc 0.0562*** 0.0564***
(0.0063) (0.0063)
Fixed-Effects: ---------------- ---------------- ---------------- ----------------
fips Yes Yes Yes Yes
year Yes Yes Yes Yes
______________________________ ________________ ________________ ________________ ________________
S.E.: Clustered by: fips by: fips by: fips by: fips
Observations 52,060 52,060 52,060 52,060
R2 0.89751 0.89750 0.89786 0.89785
Within R2 0.02336 0.02326 0.02664 0.02654

Dealing with endogeneity

There is a significant endogeneity concern in using total active production and active mines as the treatment variable. Therefore, I have tried two instrumental variable approaches below and aim to add results using production- and employment-based Bartik instruments.

Instrumental Variable

We consider using total active production or total active mines as an instrument affecting education expenditure through property taxes or GDP. We know that property taxes have an endogenous relationship with education expenditure (it is a component of the public education budget), however, in theory mine closures/activity is unlikely to affect education expenditure, except via property taxes. We test this hypothesis below.

Conclusion: The results themselves are not promising (I think??).

Using coal production and mine activity as instruments for per capita property tax

To my eye, I don’t see any of these being useful - either they fail a first-stage F-test or the resulting coefficient on the treatment is statistically insignificant.

# Using total active production as an instrument for per capita property tax
feols(log_Total_Educ_Current_Exp_pp ~ log_gdp_total_pc | fips + year | log_Property_Tax_pp ~ total_active_prod, data = mines_df, se = "cluster", cluster = "fips")
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_Property_Tax_pp, Instr.: total_active_prod
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 57,540 
Fixed-effects: fips: 2,740,  year: 21
Standard-errors: Clustered (fips) 
                        Estimate Std. Error  t value Pr(>|t|) 
fit_log_Property_Tax_pp 0.058338   2.110886 0.027637  0.97795 
log_gdp_total_pc        0.063592   0.635469 0.100071  0.92030 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.08755     Adj. R2: 0.904207
                Within R2: 0.090263
F-test (1st stage), (log) Property Taxes pp: stat = 0.016802, p = 0.896867, on 1 and 57,517 DoF.
                                 Wu-Hausman: stat = 3.924e-4, p = 0.984196, on 1 and 54,777 DoF.
# Using the lag of total active production as an instrument for per capita property tax
feols(log_Total_Educ_Current_Exp_pp ~ log_gdp_total_pc | fips + year | log_Property_Tax_pp ~ l(total_active_prod,1), data = mines_df, se = "cluster", cluster = "fips", panel.id = ~fips+year)
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_Property_Tax_pp, Instr.: l(total_active_prod, 1)
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 54,800 
Fixed-effects: fips: 2,740,  year: 20
Standard-errors: Clustered (fips) 
                         Estimate Std. Error   t value Pr(>|t|) 
fit_log_Property_Tax_pp  0.605184    3.76925  0.160558  0.87245 
log_gdp_total_pc        -0.099781    1.12967 -0.088328  0.92962 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.146615     Adj. R2: 0.717744
                 Within R2: -1.59074
F-test (1st stage), (log) Property Taxes pp: stat = 0.027686, p = 0.86785, on 1 and 54,778 DoF.
                                 Wu-Hausman: stat = 0.050683, p = 0.82188, on 1 and 52,038 DoF.
# Using total active mines as an instrument for per capita property tax
feols(log_Total_Educ_Current_Exp_pp ~ log_gdp_total_pc | fips + year | log_Property_Tax_pp ~ total_active_n, data = mines_df, se = "cluster", cluster = "fips")
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_Property_Tax_pp, Instr.: total_active_n
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 57,540 
Fixed-effects: fips: 2,740,  year: 21
Standard-errors: Clustered (fips) 
                        Estimate Std. Error  t value Pr(>|t|) 
fit_log_Property_Tax_pp 0.212925   0.241578 0.881390  0.37818 
log_gdp_total_pc        0.016903   0.072317 0.233731  0.81521 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.089907     Adj. R2: 0.89898 
                 Within R2: 0.040629
F-test (1st stage), (log) Property Taxes pp: stat = 7.51204 , p = 0.006131, on 1 and 57,517 DoF.
                                 Wu-Hausman: stat = 0.575345, p = 0.448146, on 1 and 54,777 DoF.
# Using lag og total active mines as an instrument for per capita property tax
feols(log_Total_Educ_Current_Exp_pp ~ log_gdp_total_pc | fips + year | log_Property_Tax_pp ~ l(total_active_n,1), data = mines_df, se = "cluster", cluster = "fips", panel.id = ~fips+year)
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_Property_Tax_pp, Instr.: l(total_active_n, 1)
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 54,800 
Fixed-effects: fips: 2,740,  year: 20
Standard-errors: Clustered (fips) 
                        Estimate Std. Error  t value Pr(>|t|) 
fit_log_Property_Tax_pp 0.122852   0.344653 0.356451  0.72153 
log_gdp_total_pc        0.044127   0.102092 0.432227  0.66561 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.085713     Adj. R2: 0.903532
                 Within R2: 0.114552
F-test (1st stage), (log) Property Taxes pp: stat = 2.99452, p = 0.083553, on 1 and 54,778 DoF.
                                 Wu-Hausman: stat = 0.00109, p = 0.973658, on 1 and 52,038 DoF.
Using coal production and mine activity as instruments for per capita gdp

All pass a first-stage F-test with high statistic and low p-value; however, less than satisfactory on the Wu-Hausman endogeneity test.

Conclusion: This is the results I am most inclined to “report” in some way, even though it is weak. So I am curious for your thoughts on the validity (and blind spots I’ve missed when constructing the model).

# Using total active production as an instrument for per capita gdp
feols(log_Total_Educ_Current_Exp_pp ~ 1 | fips + year | log_gdp_total_pc ~ total_active_prod, data = mines_df, se = "cluster", cluster = "fips")
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_gdp_total_pc, Instr.: total_active_prod
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 57,540 
Fixed-effects: fips: 2,740,  year: 21
Standard-errors: Clustered (fips) 
                     Estimate Std. Error t value  Pr(>|t|)    
fit_log_gdp_total_pc 0.080496   0.026438 3.04476 0.0023508 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.090889     Adj. R2: 0.896764
                 Within R2: 0.019562
F-test (1st stage), (log) GDP pc: stat = 271.2     , p < 2.2e-16 , on 1 and 57,518 DoF.
                      Wu-Hausman: stat =   3.999e-4, p = 0.984045, on 1 and 54,778 DoF.
# Using the lag of total active production as an instrument for per capita gdp
feols(log_Total_Educ_Current_Exp_pp ~ 1 | fips + year | log_gdp_total_pc ~ l(total_active_prod,1), data = mines_df, se = "cluster", cluster = "fips", panel.id = ~fips+year)
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_gdp_total_pc, Instr.: l(total_active_prod, 1)
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 54,800 
Fixed-effects: fips: 2,740,  year: 20
Standard-errors: Clustered (fips) 
                     Estimate Std. Error t value  Pr(>|t|)    
fit_log_gdp_total_pc 0.090553   0.030083 3.01014 0.0026351 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.090229     Adj. R2: 0.893103
                 Within R2: 0.018802
F-test (1st stage), (log) GDP pc: stat = 258.8     , p < 2.2e-16 , on 1 and 54,779 DoF.
                      Wu-Hausman: stat =   0.069981, p = 0.791366, on 1 and 52,039 DoF.
# Using total active mines as an instrument for per capita gdp
feols(log_Total_Educ_Current_Exp_pp ~ 1 | fips + year | log_gdp_total_pc ~ total_active_n, data = mines_df, se = "cluster", cluster = "fips")
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_gdp_total_pc, Instr.: total_active_n
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 57,540 
Fixed-effects: fips: 2,740,  year: 21
Standard-errors: Clustered (fips) 
                     Estimate Std. Error  t value Pr(>|t|) 
fit_log_gdp_total_pc 0.033872   0.058377 0.580229  0.56181 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.091196     Adj. R2: 0.896064
                 Within R2: 0.012916
F-test (1st stage), (log) GDP pc: stat = 368.8    , p < 2.2e-16 , on 1 and 57,518 DoF.
                      Wu-Hausman: stat =   2.38163, p = 0.122775, on 1 and 54,778 DoF.
# Using lag og total active mines as an instrument for per capita gdp
feols(log_Total_Educ_Current_Exp_pp ~ 1 | fips + year | log_gdp_total_pc ~ l(total_active_n,1), data = mines_df, se = "cluster", cluster = "fips", panel.id = ~fips+year)
TSLS estimation, Dep. Var.: log_Total_Educ_Current_Exp_pp, Endo.: log_gdp_total_pc, Instr.: l(total_active_n, 1)
Second stage: Dep. Var.: log_Total_Educ_Current_Exp_pp
Observations: 54,800 
Fixed-effects: fips: 2,740,  year: 20
Standard-errors: Clustered (fips) 
                     Estimate Std. Error t value Pr(>|t|) 
fit_log_gdp_total_pc 0.063122   0.052963 1.19181  0.23344 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.090258     Adj. R2: 0.893034
                 Within R2: 0.018169
F-test (1st stage), (log) GDP pc: stat = 353.2     , p < 2.2e-16, on 1 and 54,779 DoF.
                      Wu-Hausman: stat =   0.311899, p = 0.57652, on 1 and 52,039 DoF.

Bartik Instrument

TBA: Results to be added